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EPO- Munich 

TITLE OF TH E INVENTION J 8. Dez. 2003 

Audio Enhancement in Coded Domain 
FIELD OF THE INVENTION 

The present invention relates to voice enhancement, and in 
particular to a method and an apparatus for enhancing a coded 
audio signal. 

BACKGROUND OF THE INVENTION 

Improved voice quality created by voice processing DSP 
(Digital Signal Processing) algorithms has been used to 
differentiate network providers. The transfer to packet 
networks or networks with extended tandem free operation (TFO) 
or transcoder free operation (TrFO) will diminish this ability 
to differentiate networks with traditional voice processing 
algorithms. Therefore, operators which have generally been 
responsible for maintaining speech quality for their customers 
are asking for voice processing algorithms to be carried out 
also for coded speech. ^ 

TFO is a voice standard to be deployed in the GSM (Global 
System for Mobile communications) and GSM-evolved 3G (Third 
Generation) networks, as described in 3GPP TS 28.062 V4.1.0 
(2001-06), "3rd Generation Partnership Project; Technical 
Specification Group Services and System Aspects; In-band 
Tandem Free Operation (TFO) of Speech Codecs; Stage 3 - 
Service Description". It is intended to avoid the traditional 
double speech encoding/decoding in mobile-to-mobile call 
configurations. The key inconvenience of a tandem 
configuration is the speech quality degradation introduced by 
the double transcoding. According to the ETSI listening tests, 
this degradation is usually more noticeable when the speech 
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codecs are bperating at low rates. Also, higher background 
noise level increases the degradation. 

When the originating and terminating connections are using the 
same speech codec it is possible to transmit transparently the 
speech frames received from the originating MS (Mobile 
Station) to the terminating MS without activating the 
transcoding functions in the originating and terminating . 
networks. 

The key advantages of Tandem Free Operation are improvement in 
speech quality by avoiding the double transcoding in the 
network, possible savings on the inter-PLMN (Public Land 
Mobile Network) transmission links, which are carrying 
compressed speech compatible with a 16 kbit/s or 8 kbit/s sub- 
multiplexing scheme, including packet switched transmission, 
possible savings in processing power in the network equipment 
since the transcoding functions in the Transcoder Units are 
bypassed, and possible reduction in the end-to-end 
transmission delay. 

In TFO call configuration a transcoder device is physically 
present in the signal path, but the transcoding functions are 
bypassed. The transcoding device may perform control and 
protocol conversion functions. In Transcoder Free Operation 
(TrFO), on the other hand, no transcoder device is physically 
present and hence no control or conversion or other functions 
associated with it are activated. 

The level of speech is an important factor affecting the 
perceived quality of speech. Typically in the network side 
there are used automatic level control algorithms, which 
adjust the speech level to a certain desired target level by 
increasing the level of faint speech and somewhat decreasing 
the level of very loud voices. 
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These methods cannot be utilized as such in future packet 
networks where the speech travels in the coded format end-to- 
end from the transmitting device to the receiving device. 

Currently the coded speech is decoded in the network and 
speech enhancement is carried out with linear PCM samples 
using traditional speech enhancement methods. After that the 
speech is encoded again, and transmitted to the receiving 
party. 

In the WO 01/03317 Al a coded domain level control method is 
disclosed for GSM EFR speech codec. 

However, for example, for AMR speech codec the level control 
is more difficult in the lower modes due to the fact that the 
fixed codebook gain is no longer scalar quantized but it is 
vector-quantized together with the adaptive codebook gain. 

SUMMARY OF THE INVENTION 

It is an object of the invention to provide a method and an 
apparatus for enhancing a coded audio signal by means of which 
the above-described problems are overcome and enhancement of a 
coded audio signal is improved. 

According to a first aspect of the invention, this object is 
achieved by a method of enhancing a coded audio signal 
according to claim 1, an apparatus for enhancing a coded audio 
signal according to claim 7 and a computer program product 
according to claim 15. 

According to a second aspect of the invention, this object is 
achieved by a method of enhancing a coded audio signal 
according to claim 2, an apparatus for enhancing a coded audio 
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signal according to claim 8 and a computer program product 
according to claim 15. 

According to a third aspect of the invention, this object is 
achieved by a method of enhancing a coded audio signal 
according to claim 13, an apparatus for enhancing a coded 
audio signal according to claim 14 and a computer program 
product according to claim 15. 

According to an embodiment of the invention, a coded audio 
signal comprising speech and/or noise in a coded domain is 
enhanced by manipulating coded speech and/or noise parameters 
of an AMR (Adaptive Multi-Rate) speech codec. As a result, 
adaptive level control, echo control and noise suppression can 
be achieved in the network even if speech is not transformed 
into linear PCM samples, as is the case in TFO, TrFO and 
future packet networks. 

More precisely, according to an embodiment of the invention a 
method for controlling the level of the AMR coded speech for 
all the AMR codec modes 12.2 kbit/s, 10.2 kbit/s, 7.95kbit/s, 
7.40 kbit/s , 6.70 kbit/s, 5.90 kbit/s, 5.15 kbit/s and 4.75' 
kbit/s is described. The level of the coded speech is. adjusted 
by changing one of the coded speech parameters, namely the 
quantization index of the fixed codebook gain factor in the 
modes 12.2 kbit/s and 7.95 kbit/s. In the rest of the modes 
the fixed codebook gain is jointly vector-quantized with the 
adaptive codebook gain, and therefore adjusting the level of 
the coded speech requires changing both the fixed codebook 
gain factor and the adaptive codebook gain (joint index). 

According to the invention, a new gain index is found such 
that the error between the desired gain and the realized 
effective gain becomes minimized. The proposed level control 
does not cause audible artifacts. 
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Therefore, according to the invention, level control is 
enabled also in lower AMR bit rates (not only 12.2 kbit/s and 
7.95 kbit/s) . The level control in the AMR mode 12.2 kbit/s 
can be improved by taking into account the required 
corresponding level control for the comfort noise level. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Fig. 1 shows a simplified model of speech synthesis in AMR. 

Fig. 2 demonstrates the effect of a ' DTX operation on a gain 
manipulation algorithm with noisy child speech samples. 

Fig. 3 shows a diagram illustrating a response of an adaptive 
codebook to a step- function. 

Fig. 4 shows a non-linear 32-level quantization table of a 
fixed codebook gain factor in modes 12.2 kbit/s and 7.95 
kbit/s. 

Fig, 5 shows a diagram illustrating the difference between 
adjacent quantization levels in the quantization table of Fig. 
4. 

Fig. 6 shows a vector quantization table for an adaptive 
codebook gain and a fixed codebook gain in modes 10.2, 7.4 and 
6.7 kbit/s. 

Fig. 7 shows a vector quantization table for an adaptive 
codebook gain and a fixed codebook gain factor in modes 5.90 
and 5.15 kbit/s. 
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Fig. 8 shows a diagram illustrating a change in the fixed 
codebook gain when the fixed codebook gain factor is changed 
one quantization step. 

Figs. 9 and 10 show diagrams illustrating re-quantized levels 
of the fixed codebook gain factor. 



Fig. 11 illustrates values of terms M and M. with male 
speech samples. 



B^g. 12 illustrates values of terms M ^nd W ^^.^^ 
speech samples. 

Fig. 13 shows a flow chart illustrating a method of enhancing 
a coded audio signal according to the invention. 

Fig. 14 shows a schematic block diagram illustrating an 
apparatus for enhancing a coded audio signal according to the 
present invention- 

Fig. 15 shows a block diagram illustrating the use of fixed 
gain. 

Fig. 16 shows a diagram illustrating a high level 
implementation of the invention in a media gateway. 

DESCRIPTIO N OF THE PREFERRED EMBODIMENTS 

In the following, an embodiment of the present invention will 
be described in connection with an AMR coded audio signal 
comprising speech and/or noise. However, the invention is not 
Ixmited to AMR coding, and can be applied to any audio signal 
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coding technique employing indices corresponding to audio 
signal parameters. For example, such audio signal parameters 
may control a level of synthesized speech. In other words, the 
invention can be applied to a audio signal coding technique in 
which an index indicating a value of an audio signal parameter 
controlling a first characteristic of the audio signal is 
transmitted as coded audio signal, in which this index may 
also indicate a value of an audio signal parameter controlling 
another audio signal characteristic such as a pitch of the 
synthesized speech. 

The adaptive multi-rate speech codec (AMR) is presented to the 
extent necessary for illustrating the preferred embodiments. 
References 3GPP TS 26.09G V4 . 0 . 0 (2001-03), «3rd Generation 
Partnership Project; Technical Specification Group Services 
and System Aspects; Mandatory Speech Codec speech processing 
functions; AMR speech codec; Transcoding functions (Release 
4)", and Kondoz A. M. University of Surrey, UK, "Digital 
speech coding for low bit rate communications systems," 
chapter 6: ' Analysis-by-synthesis coding of speech, • pages 
174-214. John Wiley & Sons, Chichester, 1994 contain further 
information. 

The adaptive multi-rate (AMR) speech codec is based on the 
code-excited linear predictive (CELP) coding model. It 
consists of eight source codecs, or modes of operation, with 
bit-rates of 12.2, 10.2, 7.95, 7.40, 6.70, 5.90, 5.15 and 4.75 
kbit/s. The basic encoding and decoding principles of the AMR 
codec are explained briefly below. In addition, the matters 
relevant to the parameter domain gain control are discussed in 
more detail. 

The AMR encoding process comprises three main steps: 
LPC (Linear predi ctive coding) analy sis: 
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The short-term correlations between speech samples (formants) 
are modeled and removed by a lO'** order filter. In AMR codec 
the LP coefficients are calculated using the autocorrelation 
method. The LP coefficients are further transformed to Line 
Spectral Pairs (LSPs) for quantization and interpolation 
purposes utilizing the property of LSPs having a strong 
correlation between adjacent subframes. 

Pitch a nalysis (long-term prediction) : 

The long-term correlations between speech samples (voice 
periodicity) are modeled and removed by a pitch filter. The 
pitch lag is estimated from the perceptually weighted input 
speech signal by first using the computationally less 
expensive open-loop method. 

A more accurate pitch lag and pitch gain ^^ is then estimated 
by a closed-loop analysis around the open-loop pitch lag 
estimate, allowing also fractional pitch lags. The pitch 
synthesis filter in AMR is implemented as shown in Fig. 1 
using an adaptive codebook approach . That is, the adaptive 
codebook vector yr(n) is computed by interpolating the past 
excitation signal u(n) at the given integer delay k and phase 
(fraction) t:. 

v(h) = Y.u{ri-k^ i)b^ {t + i- 6) +^u(n -k + l + (6 - / + , . 6). 

/=o (1.1) 
« = 0,.....39, / = 0,...5, ifc = [18,143] 

where b^o is an interpolation filter based on a Hamming 
windowed sin(x)/x function^ 

Optimum excitation determina t ion (innovative excitation 
search) : 

AS shown in Fig. 1, the speech is synthesized in the decoder 
by adding appropriately scaled adaptive and fixed codebook 
.vectors together and feeding it through the short-term 
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synthesis filter. Once the parameters of the LP synthesis 
filter and pitch synthesis filter are found, the optimum 
excitation sequence in a codebook is chosen at the- encoder 
side using an analysis-by-synthesis search procedure in which 
the error between the original and the synthesized speech is 
minimized according to a perceptually weighted distortion 
measure. The innovative excitation sequences consist of 10 to 
2 (depending on the mode) nonzero pulses of amplitude ±1. The 
search procedure determines the locations of these pulses in 
the 40-sample subframe, as well as the appropriate fixed 
codebook gain . 

The CELP model parameters LP filter coefficients, pitch 
parameters, i.e. the delay and the gain of the pitch filter, 
and fixed codebook vector and fixed codebook gain 
are encoded for transmission to LSP indices, adaptive codebook 
index (pitch index) and adaptive codebook (pitch) gain index, 
and fixed codebook indices and fixed codebook gain factor 
index, respectively. 

Next, quantization of the fixed codebook gain is explained. 

To make it efficient, the fixed codebook gain quantization is 
performed using moving-average (MA) prediction with fixed 
coefficients. The MA prediction is performed on the innovation 
energy as follows. Let Ein) be the mean-removed innovation 
energy (in dB) at subframe « , and given by: 

Ein) = lO\oJ-^glf^c\i)]-E, (1.2) 

where N = 40 is the subframe size, c(/)is the fixed codebook 
excitation, and E (in dB) is the mean of the innovation energy 
(a mode-dependent constant) . The predicted energy is given by: 
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^(«)=i;M('»-o. (1.3) 



Where [b^ bjb,]=[0.6S 0:58 034 0.19] are the MA prediction 

coefficients, and Rik)±s the quantified prediction error at 
subf rame k : 

Rik) = E(k)-Eik) . (1.4) 

Now, a predicted fixed codebook gain is computed using the 
predicted energy as in Eq. (1.2) (by substituting £(«) by ^(«) 
and by g'^) . First, the mean innovation energy is found by: 



(1.5) 



and then the predicted gain g'^ is found by: 

A correction factor between the gain ^^and the estimated one 
g'gis given by: 

ygr=gc/s'e' (1.7) 

The prediction error and the correction factor are related as; 
Rin) = Ein)-Ein) = 20\og{j^J, (1.8) 

At the decoder, the transmitted speech parameters are decoded 
and speech is synthesized. 
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Decoding of the fixed codebook gain 

In case of scalar quantization (in modes 12.2 kbit/s and 7.95 
kbit/s) , the decoder receives an index to a quantization table 
that gives the quantified fixed codebook gain correction 
factor Yge - 

In case of vector quantization (in all the other modes) the 
index gives both the quantified adaptive codebook gain and 
the fixed codebook gain correction factor v 

The fixed codebook gain correction factor gives the fixed 
codebook gain the same way as described above. First, the 
predicted energy is found by: 

E[n) = Yj>Mn-i-) (1.9) 
and then the mean innovation energy is found by: 

^/=ioiogj^^gc^(y)j. (1.10) 

The predicted gain is found by: 

Sc-^^ (1-11) 

And finally, the quantified fixed codebook gain is achieved 

by: 



(1.12) 
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There are some differences between the AMR modes that are 
relevant to the parameter domain gain control, as listed 
below. 

In the 12.2 kbit/s mode, the fixed codebook gain correction " 
factor is scalar quantized with 5 bits (32 quantization 
levels) . The correction factor is computed using a mean 
energy value £^=36 dB. 

In the 10.2 kbit/s mode, the fixed codebook gain correction 
factor and the adaptive codebook gain are jointly vector 
quantized with 7 bits. The correction factor y^c is computed 
using a mean energy value ^=33 dB. Moreover, this mode 
includes smoothing of the fixed codebook gain. The fixed 
codebook gain used for synthesis in the decoder is replaced by 
a smoothed value of the fixed codebook gains of the previous 5 
subframes. The smoothing is based on a measure of the 
stationarity of the short-term spectrum in the LSP (Line 
Spectral Pair) domain. The smoothing is performed to avoid 
unnatural fluctuations in the energy contour. 

In the 7.95 kbit/s mode, the fixed codebook gain correction 
factor is scalar quantized with 5 bits, as in the mode 12.2 
kbit/s. The correction factor is computed using a mean 

energy value £=36 dB. This mode includes anti-sparseness 
processing. An adaptive anti-sparseness post-processing 
procedure- is applied to the fixed codebook vector c(n) in 
order to reduce perceptual artifacts arising from the 
sparseness of the algebraic fixed codebook vectors with only a 
few non-zero samples per an impulse response. The anti- 
sparseness processing consists of circular convolution of the 
fixed codebook vector with one of three pre-stored impulse 



responses. The selection of the impulse response is performed 
adaptively from the adaptive and fixed codebook gains. 

In the 7.40 kbit/s mode, the fixed codebook gain correction 
factor and the adaptive codebook gain are jointly vector 
quantized with 7 bits, as in the mode 10.2 kbit/s. The 
correction factor is computed using a mean energy value 
^=30 dB. 

In the 6.70 kbit/s mode, the fixed codebook gain correction 
factor and the adaptive codebook gain g-^ are jointly vector 
quantized with 7 bits, as in the mode 10.2 kbit/s. The 
correction factor is computed using a mean energy value 

£•=28.75 dB. This mode includes smoothing of the fixed codebook 
gain, and anti-sparseness processing. 

In the 5.90 and 5.15 kbit/s modes, the fixed codebook gain 
correction factor and the adaptive codebook gain are 
jointly vector quantized with 6 bits. The correction factor y 
is computed using a mean energy value ^=33 dB. The modes 
include smoothing of the fixed codebook gain and anti- 
sparseness processing. 

In the 4.75 kbit/s mode, the fixed codebook gain correction 
factor Ygc and the adaptive codebook gain are jointly vector 
quantized only every 10 ms by a unique method as described in 
3GPP TS 26.090 V4.0.0 (2001-03) "3rd Generation Partnership 
Project; Technical Specification Group Services and System 
Aspects; Mandatory Speech Codec speech processing functions; 
AMR speech codec; Transcoding functions (Release 4)". This 
mode includes smoothing of the fixed codebook gain and anti- 
sparseness processing. 
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Discontinuous transmission (DTX) 

During discontinuous . transmission (DTX) only the average 
background noise information is transmitted at regular 
intervals to the decoder when speech is not present as 
described in 3GPP TS 26.092 V4 . 0 . 0 (2001-03), "3rd Generation 
Partnership Project; Technical Specification Group Services 
and System Aspects; Mandatory Speech Codec speech processing 
functions; AMR speech codec; Comfort noise aspects (Release 
4)". At the far-end the decoder reconstructs the background 
noise according to the transmitted noise parameters avoiding 
thus extremely annoying discontinuities in the background 
noise in the synthesized speech. 

The comfort noise parameters, information on the level and the 
spectrum of the background noise are encoded into a special 
frame called a Silence Descriptor (SID) frame for transmission 
to the receive side. 

For parameter domain gain control purposes, the information on 
the level of the background noise is of interest. If the gain 
level were adjusted only during speech frames, the background 
noise level would change abruptly at the beginning and end of 
noise only bursts, as illustrated in Fig. 2. The level changes 
in the background noise are subjectively very annoying see 
e.g. Kondoz A. M. , University of Surrey, UK, "Digital speech 
coding for low bit rate communications systems, " page 336, 
John Wiley s.Sons, Chichester, 1994. The more annoying the 
greater the amplification or attenuation is. If the level of 
speech is adjusted, also the level of the background noise has 
to be adjusted accordingly to prevent any fluctuations in the 
background noise level. 

At the transmitting side, the frame energy is computed for 
each frame marked with (Voice Activity Detection) VAD=0 
according to the equation: 
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(1.13) 



where s(n) is the high-pass filtered input speech signal of the 
current frame i . 

The averaged logarithmic energy is computed by: 

e«2r (0 = a z^«iog 0 - «) • (1.14) 

The averaged logarithmic frame energy is quantized by means of 
a 6-bit algorithmic quantizer. These 6 bits for the energy 
index are transmitted in the SID frame. 

In the following, gain control in the parameter domain is 
described. 

The fixed codebook gain adjusts the level of the synthesized 
speech in the AMR speech codec, as can be noticed by studying 
the equation (1.1) and the speech synthesis model shown in 
Fig. 1. 

The adaptive codebook gain gp controls the periodicity (pitch) 
of the synthesized speech, and is limited between [0, 1.2]. 
As shown in Fig. 1, an adaptive feedback loop transmits the 
effect of the fixed codebook gain also to the adaptive 
codebook branch of the synthesis model thereby adjusting also 
the voiced part of the synthesized speech. 

The speed at which the change in the fixed codebook gain is 
transmitted to the adaptive codebook branch depends on the 
pitch delay T and the pitch gain g^, as illustrated in Fig. 3. 
The longer the pitch delay and the higher the pitch gain, the 
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longer it takes for the adaptive codebook vector v(n) to 
stabilize (to reach its corresponding level) . 

For real speech signals, the pitch gain and delay vary. 
However, the simulation with a fixed pitch delay and pitch 
gain tries to give a rough estimate on the limits to the 
stabilization time of the adaptive codebook after a change in 
the fixed codebook gain. The pitch delay is limited in AMR 
between [18, 143] samples, as in the example too, 
corresponding to high child and low male pitches, 
respectively. The pitch gain, however, may have values between 
[0,1.2]. For zero pitch gain, there is naturally no delay at 
all. On the other hand, the pitch gain receives values at or 
above 1 only very short time instants for the adaptive 
codebook not to go unstable. Therefore, the estimated maximum 
•delay is around few thousand samples, about half a second. 

Fig. 3 shows the response of the adaptive codebook to a step- 
function (sudden change in g^) as a function of pitch delay T 
(integer lag k in Eq. (l.l)) and pitch gain gp. The output of 
the scaled fixed codebook, gc*c(n) , changes from 0 to 0.3 at 
time instant 0 samples. The output of the adaptive codebook 
(and thus also the excitation signal u(n)) reaches its 
corresponding level after 108 to 5430 samples, for the pitch 
delays T and pitch gains gp of the example. 

In. the highest bit rate mode," 12.2 kbit/s, the fixed codebook 
gain correction factor is scalar quantized with 5-bits, 
giving 32 quantization levels, as shown in Fig. 4. The 
quantization is nonlinear. The quantization steps are shown in 
Fig. 5. The quantization step is between 1.2 dB to 2.3 dB. 
The same quantization table is used in the mode 7.95 kb/s. In 
all other modes, the fixed codebook gain factor is jointly 
vector quantized with the adaptive codebook gain. These 
quantization tables are shown in Figs. 6 and 7. 



- 17 - 



The lowest mode 4.75 kbit/s uses vector quantization in a 
•unique way. In the mode 4.75 kbit/s the adaptive codebook 
gains and the correction factors are jointly vector 
quantized every 10 ms with 6 bits, i.e. two codebook gains of 
two frames and two correction factors are jointly vector 
quantized. 

Fig. 5 shows a difference between adjacent quantization level 
in the quantization table of the fixed codebook gain factor y 
in the modes 12.2 kbit/s and 7.95 kbit/s. The quantization 
table is approximately linear between indexes 5 and 28. The 
quantization step in that range is about 1.2 dB. 

Fig. 6 shows the vector quantization table for the adaptive 
codebook gain and the fixed codebook gain factor in the modes 
10.2, 7.4 and 6.7 kbit/s. The table is printed so that one 
index value gives both the fixed codebook gain factor and the 
corresponding (jointly quantized) adaptive codebook gain. As 
can be seen from Fig. 6, there are approximately 16 levels to 
choose from for the fixed codebook gain while the adaptive 
codebook gain remains fairly fixed. 

Fig. 7 shows the vector quantization table for the adaptive 
codebook gain and the fixed codebook gain factor in the modes 
5.90 and 5.15 kbit/s. Again, the table is printed so that one 
index value gives both the fixed codebook gain factor and the 
corresponding (jointly quantized) adaptive codebook gain. 

As explained above, the speech level control in the parameter 
domain must take place by adjusting the fixed codebook gain. 
To be more specific, the quantized fixed codebook gain 
correction factor is adjusted, which is one of the speech 
parameters transmitted to the far-end. 
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In the following, the relationship between amplification of 
the fixed codebook gain correction factor and the 
amplification of the fixed codebook gain is shown. As already 
shown in Eqs. (1.11) and (1.12),. the fixed codebook gain is 
defined as: 



^cW=rgc(«)io J. . (2.1) 

If the fixed codebook gain correction factor f^in) is amplified 
hy fi , at subframe n, and is kept unchanged at least for the 
following four subframes, the new. quantized fixed codebook 
gain becomes: 



. ^ O.05r2:ii2oiog„(?,{,-o)«-c-f, 1 



In the next subframe, n+1, the new fixed codebook gain 
becomes: 



(2.3) 



-new, ,v ^. «<wU^*"°«'»^^^2*>201og,„(r^({«+l)-0)fC-£,l 

g;^(/i+l) = ^^(/i+l).10 L « J . (2.4) 

ir(«-Hl) = ^r,(«.l).100-lV0...(.)l.,0^S'^^"'-'^^^ ^2 . 5) 



- 19 - 



In the same way, in the following subframes, n+2, n+A, the 
amplified fixed codebook gain becomes: 

g;"'(n+2) = ^.^*>.^*i^^(„+2) (2.8) 
i;~(„+4)=;ff0**»*^*****«).^^(„+4). (2.9) 
Since the prediction coefficients were given as 
*2 *3 ^4]= [0.68 0.58 0.34 0.19], 



the fixed codebook gain stabilizes after five subframes into a 
value: 



grin+4) = fi^^.gf(n+4). (2.10) 

In other words, multiplying the fixed codebook gain factor 
with fi results in multiplication of the fixed codebook gain 
(and therefore also the synthesized speech) by fi'^ , assuming 
that fi is held constant at least during the next four frames. 
Therefore, e.g. in AMR modes 12.2 kbit/s and 7.95 kbit/s, the 
minimum change for the fixed codebook gain factor (the minimum 
quantization step) ±1.2 dB results in ±3.4 dB change in the 
fixed codebook gain, and hence in the synthesized speech 
signal, as shown below. 

20log,o ^ = 1.2 rffl >3 = 1.15 

201og,o(/92'')=3.4<« (2.11) 

This ±3.4 dB change in the synthesized speech level takes place 
gradually, as illustrated in Fig. 8. 
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Fig. 8 shows a change in the fixed codebook gain (AMR 12.2 
kbit/s) , when the fixed codebook gain factor is changed one 
quantization step (in the linear quantization range) first 
upwards at subframe 6 and then downwards at subframe 16. The 
1.2 dB amplification (or attenuation) of .the fixed codebook 
gain factor amplifies (or attenuates) the fixed codebook gain 
gradually 3.4 dB during 5 frames (200 samples). 

Consequently, the parameter level gain control of coded speech 
may be made by changing the index value of the fixed codebook 
gain factor. That is, the index value in the bit stream is 
replaced by a new value that gives the desired 
amplification/attenuation. The gain values corresponding to 
the index changes for AMR mode 12.2 kbit/s are listed in the 
table below. 

Table I: Parameter level gain values for AMR 12.2 kbit/s. 



Change in the fixed codebook 
gain factor index value 


Resulting amplification/ 
attenuation of the speech 
signal 






+4 


13.6 dB 


+3 


10.2 dB 


+2 


6.8 dB 


+ 1 


3.4 dB 


0 


0 dB 


-1 


- 3.4 dB 


-2 


- 6.8 dB 


-3 


- 10.2 dB 


-4 


- 13.6 dB 







Next, a search for the correct index for the desired change in 
the overall gain is described by taking into account the 



- 21 - 



nonlinear nature of the fixed codebook gain factor 
quantization. 

The new fixed codebook gain factor quantization index 
corresponding to the desired amplification/attenuation of the 
speech signal is found by minimizing the error: 

\fir^-r^\, (2.12) 

where y'^ and are the old and the new fixed codebook gain 

correction factors and A is the desired multiplier: 

^ = A^,y = t..-4,-3,..0,..+3,+4,...jA = minimum quantization step (1.15 in AMR 
12.2 kbit/s)). Note that the speech signal becomes 
amplified/attenuated with fl^'" . 

Fig. 9 shows the re-quantized levels for cases +3.4, +6.8, 
+10.2, +13.6 and +17. 0 dB signal amplification achieved with 
the above error minimization procedure. Fig. 10 shows also the 
quantization levels in cases of signal attenuation. Both 
figures show the quantization levels for the AMR mode 12.2 
kbit/s. 

In Fig. 9 the lowest curve shows the original quantization 
levels of the fixed codebook gain factor. The second lowest 
curve shows re-quantized levels of the fixed codebook gain 
factor in the case of +3.4 dB signal level amplification, and 
the subsequent curves show re-quantized levels of the fixed 
codebook gain factor in cases +6.8, +10.2, +13.6 and +17 dB 
signal level amplification, respectively. 

Fig. 10 shows re-quantized levels of the fixed codebook gain 
factor in cases: -17,. -13.6, -3.4, 0, +3.4, +13.6, +17 
dB signal level amplification. The curve in the middle shows 
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the original quantization levels of the fixed codebook gain 
factor. 



In AMR modes 10.2 kbit/s, 7.40 kbit/s, 6.70 kbit/s, 5.90 
kbit/s, 5.15 kbit/s and 4.75 kbit/s, the equation 2.12 is 
replaced by: 

k>'^''-yrKH'«g/,/.|g,_-^,^|, (2.13) 

where the weight is > 1, and g^_^ and g^_^ are the new and 
old adaptive codebook gains, respectively. 

In other words, in modes 12.2 kbit/s and 7.95 kbit/s, the new 
fixed codebook gain factor index is found as the index which 
minimizes the error given in Eq. (2.12). In modes 10.2 kbit/s 
7.40 kbit/s, 6.70 kbit/s, 5.90 kbit/s, 5.15 kbit/s and 4.75 
kbit/s the new joint index of the vector quantized fixed 
codebook gain factor and adaptive gain is found as the index 
which minimized the error given in Eq. (2.13). The rationale 
behind the Eq. (2.13) is to be able to change the fixed 
codebook gain factor without introducing audible error to the 
adaptive codebook gain. Fig. 6 shows the vector quantized 
fixed codebook gain factors and adaptive codebook gains at 
different index values. From Fig. 6 it can be seen that there 
is a possibility to change the fixed codebook gain factor 
without having to change the adaptive codebook gain 
excessively. 

As mentioned above, in the mode 4.75 kbit/s the adaptive 
codebook gains and the correction factors are jointly 
vector quantized every 10 ms with 6 bits, i.e. two codebook 
gains of two subframes and two correction factors are jointly 
vector quantized. The codebook search is done by minimizing a 
weighted sum of the error criterion for each of the two 
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subframes. The default values of the weighing factors are 1. 
If the energy of the second subframe is more than two times 
the energy of the first subframe, the weight of the first 
subframe is set to 2 . If the energy of the first subframe is 
more than four times the energy of the second subframe, the 
weight of the second subframe is set to 2. Despite of these 
differences, the mode 4.75 kbit/s can be processed with the 
vector quantization schema described above. 

Thus, according to the above-described embodiment, a new gain 
index (new index value) minimizing the error between the 
desired gain p.f^ (enhanced first parameter value) and the 
realized effective gain y'^^ (new first parameter value) 
according to Eq. (2.12) or (2.13) is determined according to 
the quantization tables for the respective modes. The new 
fixed codebook gain correction factor (and the new adaptive 
codebook gain in case of modes other than l2.2 kbits/s and 
7.95 kbit/s) correspond to the determined new gain index. The 
old gain index (current index value) representing the old 
fixed codebook gain correction factor y"^ (current first 
parameter value) (and the old adaptive codebook gain gp_ojd 
(current second parameter value) in case of modes other than 
12.2 kbits/s and 7.95 kbit/s) then is replaced by the new gain 
index . 

In the following, alternative methods for providing an 
improved gain accuracy are described. At first it is 
illustrated how the total desired gain is formulated in case 
the gain is not kept constant during five consecutive 
subframes . 



As described above, in the AMR-codec, the fixed codebook gain 
is encoded using the fixed codebook gain correction factor y 
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The gain correction factor is used to scale the predicted 
fixed codebook gain g'^ to obtain the fixed codebook gain g^, 
i.e. 

Se y gc Se ^ge ~ T * 
Se 

The fixed codebook gain is predicted as follows: 

0.05[2*,201og„(f^(n-<)M-F,l 

5;(«) = 10 L.. J ^3^^j 

where E is a mode dependent energy value (in dB) and is the 
fixed codebook excitation energy (in dB) . 

To obtain a desired overall signal gain a , the quantified 
fixed codebook correction factor has to be multiplied by a 
correction factor gain . Realized correction factor gains are 
denoted with fi{n-i%i>0\ By amplifying the fixed codebook 
correction factor fgcin) with 0{n) , at subframe n, the new 
quantized fixed codebook gain becomes: (Note that the 
prediction g'^ depends on the history of the correction gains, 
as shown in Equation 2.14) 

ir(«) = An)r^(nkr'^(«) 

, _ , . 0.05 £»,20log,oC9(#i-l)/_(n-i)V2-£,l 

£^ Iog,o^(ii-/)y^(fl-i))f0.05f -0.05£/ 



ir('«)=A«)y^Wio'- 

grin) = 10^^' '""^^'-^\oS*' 
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Therefore, a new prediction, which is obtained using the 

realized factor gains J3(n-i) , can be written as - 

i;*/»og,„Od(n-o) 
8c =10'-' . Furthermore, 

ir(») = 4(n)I05*-"'*'-''.^„{„fe(„) 

ir(")->o'-"-.io5''^"'*-"*.^^(„fe;(„) 
ir(«)=t^.(/i). 

i.e., the target correction factor gain for the present 
subframe can be written as 

or =10'- <=>yff(„) = — —SL 

Z*''»8io(4(»-i)) 

10'-' 

If ^(n) is kept constant, the overall gain stabilizes after 
five subframes into a value 

or = 10'- =10 '- =/3^ =fi^^<^^ = a"^ = a, 

because the prediction coefficients were given as 
b = (1,0.68,0.58,0.34,0. 1 9] . 

Next, a first alternative of the above described gain 
manipulation is described, which first alternative is referred 
to as Synthesizing Error Minimization (synthesizing method) . 

The algorithm according to the synthesizing method follows as 
much as possible the original error criteria given for the 
scalar quantization as 
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where E^q is the fixed codebook quantization error and is 
the target fixed codebook gain. As mentioned before, the goal 
is to scale the fixed codebook gain with the desired total 
gain gT =c^c' Therefore, for the CDALC {Coded Domain Automatic 
Level Control) purposes, the target must be scaled by the 
desired gain, i.e. 



(3.2) 



In the vector quantization, the pitch gain ^^and the fixed 
codebook correction factor are jointly quantized. In the 
AMR encoder, the vector quantization index is found by 
minimizing the quantization error fi'^ defined as 



where x,yand2are a target vector, a weighted LP-filtered 
adaptive codebook vector and a weighted LP-filtered fixed 
codebook vector, respectively. The error criterion is actually 
a norm of the perceptually weighted error between the target 
and the synthesized speech. Following the . procedure of the 
scalar quantization, the target vector is replaced by the 
scaled version, i.e. - 

= IM^y + <^s.z) - gTy-^ - gr4 • (3.3) 

In the following, the synthesizing method is described for the 
scalar quantization. 

The derivation of the minimization criterion is started from 
the Equation 3.2 used in the AMR-encoder and given as: 

EsQ=W.-7-^g'rJ. 
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unfortunately, there is no direct access to g^, however it can 
be approximated by g^^y^g', and therefore the first CDALC error 
criterion for the scalar quantization can be written as 



Esq = 



Z*' ">8io(4("-0) 



\2 



g'c 



EsQ=g'c 



f 4 X2 

^ J 



<=> 



(3.4) 



where is the realized correction factor gain for the 

subframe (n-i), i.e. 



This error criterion is simple to evaluate and only the fixed 
codebook correction factor has to be decoded. Furthermore, 
four previous realized correction factor gains have to be kept 
in the memory. 

Next, the synthesizing method is described for the vector 
quantization. 

For the vector quantization case the error criterion used in 
the AMR-encoder is more complicated, since the synthesis 
filters are used. In view of the fact that there is no direct 
access to the target x, it is approximated by +i,z . Thus, 
the error minimization with CDALC becomes: 
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■^yQ=\\(i.ov+^ic^)-gT'ay-grz\\ 

^Kc=fc-irW+(«ic-ir>|| (3.5) 



EyQ4{g,-grh+8'c 



In addition to decoding the. gains, both codebook vectors have 
to be decoded and filtered with the LP-synthesis filter. 
Therefore, LP-synthesis filter parameters have to be decoded. 
This means that basically all the parameters have to be 
decoded. In the AMR-encoder the codebook vectors are also 
weighted by a specific weighting filter, but this was not done 
for this. CDALC error criterion. 

Next, a second alternative of the gain manipulation is 
described, which second alternative is referred to as 
Quantization Error Minimization with Memory (memory method) . 

This criterion minimizes quantization error- while taking in 
account the history of the previous correction factors. In* 
case of scalar quantization the error criterion is the same as 
in the first alternative, i.e. the error function to be 
minimized will be the same as in Equation 3.4. But for the 
vector quantization the error function becomes little easier 
to evaluate. 

Vector Quantization 

Starting from the error function derived for the first 
alternative and given in Equation 3.5, minimizing the error of 
the sum of two components will require decoding the . y and z 
vectors. Practically this means that the whole signal has to 
be decoded. Instead of minimizing the norm of the error 
vector, the error can be approximated by the sum of two error 
components (which would be the case if both vectors y and z 
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are parallel to each other), namely the pitch gain error and 
the fixed codebook gain error. Combining these components 
using the Euclidean norm, the new error criteria can be 
written as: 



g'c 



L*/»og,o(4(ii-of 



ZXtog,o(^(ii-/)) 



2 



(3.6) 









r«iiyn\ 


U:hJ 



Z*' '08io(4(»-i)) 



The sum of the previous equation (Equation 3.5) is divided 
into two components. However, the synthesized codebook vectors 

still exist in the pitch gain error scaling, term f T Due 

UcINU 

to the synthesis, the pitch gain error scaling term is 
complicate to compute. If it- is computed, it would be more 
efficient to use the synthesization error minimization 
criterion described in the first alternative. To get rid of 

llyf 



the synthesis-procedure, the term M ig replaced by the 

constant pitch gain error weight >v,^ . The pitch gain error 
weight has to be chosen carefully, if the weight is chosen to 
be too big, the signal level will not change at all, since the 
lowest error is found by choosing g-^g,- On the other hand, a 
small weight will guarantee the desired codebook gain a , but 
it will give no guarantees for^^, i.e. 



- 30 - 



0 => minimization of term 



w^, «> minimization of term \g°" - g"^"^^ 



This algorithm using fixed pitch gain weight requires decoding 
(finding a value according to the received quantization index) 
of both the pitch gain and the correction factor iy^) and also 
reconstructing of the fixed codebooJc gain prediction g'^ . To be 
able to construct the prediction, the fixed codebook vector 
has to be decoded. Furthermore, the integer pitch lag is 
needed, for the pitch sharpening of the fixed codebook 
excitation. The energy of the fixed codebook excitation is 
required for the prediction (see Equation 3.1). if necessary, 
the prediction can be included in the fixed weight, i.e. 

^''^^M ^^^^^ ^^^^ there is no need to decode the fixed 

codebook vector. Presumably, it would not affect much in 
performance. On the other hand, the energy of the fixed 
codebook excitation can be estimated, since it is fairly ' 
fixed. This allows the creation of a prediction without 
decoding the fixed codebook vector. 



N 8cM 



The range of the terms ^ and ^ are demonstrated in Figs, 



11 and 12 with male and child speech samples using AMR mode 
12.2 kbit/s. The value depends strongly on the energy of the 
signal. Hence, it would be beneficial to make the pitch gain 
error weight w^^ adaptive instead of using a constant value. 

For example, the value may be determined using short time 
signal energy. 



Fig. 13 shows a. flow chart generally illustrating the method 
of enhancing a coded audio signal comprising coded speech 
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and/or coded noise according to the invention. The coded audio 
signal comprises indices which represent speech parameters 
and/or noise parameters which comprise at least a first 
parameter for adjusting a first characteristic of the audio 
signal, such as the level of synthesized speech and/or noise. 

In step SI in Fig. 13 a current first parameter value' is 
determined from an index corresponding to at least the first 
parameter, e.g. the fixed codebook gain correction factor y^. 
In step S2 the current first parameter value is adjusted, e.g. 
multiplied by a, in order to achieve an enhanced first 
characteristic, thereby obtaining an enhanced first parameter 
value a- r'J^ . Finally, in step S3 a new index value is 
determined from a table relating index values to at least 
first parameter values, e.g. a quantization table, such that a 
new first parameter value corresponding to the new index value 
substantially matches the enhanced first parameter value. 

According to the above-described embodiment, a new index value 
for a-r^ is searched such that the equation \a-f^ -}>^\ is 
minimized, y;^ being the new first parameter . value 
corresponding to the searched new index value. 

Moreover, according to the present invention, a current second 
parameter value may be determined from the index further 
corresponding to a second parameter such as the adaptive 
codebook gain controlling a second characteristic of speech. 
In this case, the new index value is determined from the table 
further relating the index values to second parameter values, 
e.g. a vector quantization table, such that a new second 
parameter value corresponding to the new index value 
substantially matches the current second parameter value. . 
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According to the above-described embodiment, a new index value 
for a y°f and g^_^ is searched such that the equation 

\a-fif -fr\ + ^^i8f'f:\8p.n«.-g^_M\ is minimized. g^ „^ is the new 
second parameter value corresponding to the new index value. 

"..eight" can be > 1, so that the new index value is determined 
from the table such that substantially matching the current 
second parameter value has precedence. 

Fig. 14 shows a schematic block diagram illustrating an 
apparatus 100 for enhancing a coded audio signal according to 
the invention. The apparatus receives a coded audio signal 
which comprises indices which represent speech and/or noise 
parameters which comprise at least a first parameter for 
adjusting a first characteristic of the audio signal. The 
apparatus comprises a parameter value determination block 11 
for determining a current first parameter value from an index 
corresponding to at least the first parameter, an adjusting 
block 12 for adjusting the current first parameter value in 
order to achieve an enhanced first characteristic, thereby 
obtaining an enhanced first parameter value, and an index 
value determination block 13 for determining a new index Value 
from a table relating index values to at least first parameter 
values, such that a new first parameter value corresponding to 
the new index value substantially matches the enhanced first 
parameter value. 

The parameter value determination block 11 may further 
determine a current second parameter value from the index 
further corresponding to a second parameter, and the index 
value determination block 13 may then determine the new index 
value from the table further relating the index values to 
second parameter values, such that a new second parameter 
value corresponding to the new index value substantially 
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matches the current second parameter value. Thus, the index 
value is optimized simultaneously for both the first and 
second parameters. 

The index value determination block 13 may determine the new 
index value from the table such that substantially matching 
the current second parameter value has precedence. 

The apparatus 100 may further include replacing means for 
replacing a current value of the index corresponding to the at 
least first parameter by the determined new index value, and 
output enhanced coded speech containing the new index value. 

Referring to Figs. 13 and 14, the first parameter value may be 
the background noise level parameter value which is determined 
and adjusted and for which a new index value is determined in 
order to adjust the background noise level. 

Alternatively, the second parameter value may be the 
background noise level parameter the index value of. which is 
determined in accordance with the adjusted speech level. 

As discussed beforehand, the speech level manipulation 
requires also manipulating the background noise level 
parameter during speech pauses in DTX. 

According to the AMR codec, the background noise level 
parameter, the averaged logarithmic frame energy, is quantized 
with 6 bits. The comfort noise level can be adjusted by 
changing the energy index value. The level can be adjusted in 
1.5 dB, so finding a suitable comfort noise level 
corresponding to the change of the speech level is possible. 

The evaluated comfort noise parameters (the average LSF (Line 
Spectral Frequency) parameter vector and the averaged 
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logarithmic frame energy en^ ) are encoded into a special 
frame, called a Silence Descriptor (SID) frame for 
transmission to the receiver side. The parameters give 
information on the level {en^) and the spectrum (f««") of the 
background noise. More details can be found in 3GPP TS 2 6.093 
V4.0.0 (2001-03), ''3rd Generation Partnership Projects- 
Technical Specification Group Services and System Aspects; 
Mandatory Speech Codec speech processing functions; AMR speech 
codec; Source controlled rate operation (Release 6)/'. 

The frame energy is computed for each frame marked with Voice 
Activity Detector VAD=0 according to the equation: 

where x is the HP-filtered input speech signal of the current 
frame i. The averaged logarithmic energy, which will be 
transmitted, is computed by: 

The averaged logarithmic energy is quantized by means of a 6 
bit algorithmic quantizer. Quantization is performed using 
quantization function, as defined in 3GPP TS 26.104 V4 . 1 . 0 
2001-06, "AMR Floating-point Speech Codec C-source". 

index = \ien^ (i) + 2.5)- 4 + O.sJ, 

where the value of the index is restricted to a range [0...63], 
i.e. in a range of 6 bits. The index can be computed using 
base 10 logarithm as follows: 
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iWex = [(c«;7' 0) + 2.5). 4 + 0.5 J = [4 ■ e« (/) + 1 0.5j 
i/K/ex«j^jLl01og,oC/i""™(0 + 10.5j . 

where lOlog.o is the energy in decibels. Therefore, it is 

shown that one quantization step corresponds to approximately 
1.5 dB. 

In the following the gain adjustment of the comfort noise 
parameters is described. 

Since an energy parameter is transmitted, the signal energy 
can be manipulated directly by modifying the energy 
parameters. As shown above, one quantization step equals to 
1.5 dB. Assuming that all eight frames of a SID update 
interval will be scaled by a, the new index can be found as 
follows 



index" 



= [[e«^r(0+^log2a^+2.5j.4 + 0.5j = L4.en;^(,) + 10.5 + 41og2aJ. 



Because the old index was as 
index = [4 • en^ (1) + 1 0.5 J , 

the new index can be approximated by 
index"^ « [4 log^ a J + index. 

Referring back to Figs. 13 and 14, a parameter value to be 
adjusted may be the comfort noise parameter value. 
Accordingly, a new index value index"*" is determined as • 
mentioned above. In other words, a current background noise 
parameter index value index may be detected, and a new 
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background noise parameter index value index"^" may be 
determined by adding L41og3 aj to the current background noise 
parameter index value index, wherein a corresponds to the 
enhancement of the first characteristic represented by the 
first speech parameter. 

The level of the synthesized speech signal can be adjusted by 
manipulating the fixed codebook gain factor index, as shown 
previously. While being a measure of prediction error, the 
fixed codebook gain factor index does not discover the level 
of the speech signal. Therefore, to control the gain 
manipulation, i.e. to determine whether the level should be 
changed, the speech signal level must be first estimated. 
In TFO, the six or seven MSB of the PCM speech samples (not 
compressed) are transmitted to the far end unchanged, to 
facilitate a seamless TFO interruption. These six or seven MSB 
can be used to estimate the speech level. 

If these PCM speech samples are unavailable, the coded speech 
signal must be at least partially decoded (post-filtering is 
not necessary) to estimate the speech level. 

Alternatively, there is the possibility of using a fixed gain, 
thereby avoiding a complete decoding. Fig. 15 shows a block 
diagram illustrating a scheme with the possibility of using a 
constant gain in the gain manipulation described above. In 
this case, decoding PCM signals out of the codec signal for 
using the PCM signals in the gain estimation (i.e. speech 
level estimation) is not required. The speech may be coded 
with e.g. AMR, AMR-WB (AMR Wide Band) , GSM FR, GSM EFR, GSM HR 
speech codecs. 

Fig. 16 shows a high level implementation example of the 
present invention in an MGW (Media GateWay) of the 3G network 
architecture. For example, the present invention may be 
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implemented in a DSP (Digital Signal Processor) of the MGW. 
However, it is to be noted that the implementation of the 
invention is not limited to an MGW. 

As shown in Fig. 16, coded speech is fed to the MGW. The coded 
speech comprises at least one index corresponding to a value 
of a speech parameter which adjusts the level of synthesized 
speech. This index may also indicate a value of another speech 
parameter which is affected by the speech parameter for 
adjusting the level of synthesized speech. For example, this 
other speech parameter adjusts the periodicity or pitch of the 
synthesized speech. 

In a VED (Voice Enhancement Device) shown in Fig. 16, the 
index is controlled so as to adjust the level of the speech to 
a desired level. A new index indicating values of the speech 
parameters affecting the level of the speech, such as the 
fixed codebook gain factor and adaptive codebook gain, is 
determined by minimizing an error between the desired level 
and the realized effective level. As a result, the new index 
is found which indicates values of the speech parameters 
realizing the desired level of speech. The original index is 
replaced by the new index and enhanced coded speech is output. 

It is to be noted that the partial decoding of speech shown in 
Fig. 16 relates to controlling means for determining a current 
level of speech to decide whether the level should be 
adjusted. 

The above described embodiments of the present invention may 
not only be utilized in level control itself, but also in 
noise suppression and echo control (nonlinear processing) in 
the coded domain. Noise suppression can utilize the above 
technique by e.g. adjusting the comfort noise level during 



- 38 - 



speech pauses. Echo control may utilize the above technique 
e.g. by attenuating the speech signal during echo bursts. 

The present invention is riot intended to be limited only to 
TFO and TrFO voice communication and to voice communication 
over packet-switched networks, but rather to comprise 
enhancing coded audio signals in general. The invention finds 
application also in enhancing coded audio signals related e.g. 
to audio/speech/multimedia streaming applications and to MMS 
(Multimedia Messaging Service) applications. 

It is to be understood that the above description is 
illustrative of the invention and is not to be construed as 
limiting the invention. Various modifications and applications 
may occur to those skilled in the art without departing from 
the scope of the invention as defined by the appended claims. 
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CLAIMS ; 



1. A method of enhancing a coded audio signal comprising 
indices which represent audio signal parameters which comprise 
at least a first parameter for adjusting a first 
characteristic of the audio signal, and a second parameter, the 
method comprising the steps of: 

determining a current first parameter value from an index 
corresponding to the first parameters- 
adjusting the current first parameter value in order to 
achieve an enhanced first characteristic, thereby obtaining an 
enhanced first parameter value; 

determining a current second parameter value from the 
index further corresponding to the second parameter; and 

determining a new index value from a table relating index 
values to first parameter values and further relating the 
index values to second parameter values, such that a new first 
parameter value corresponding to the new index value and a new 
second parameter value corresponding to the new index value 
substantially match the enhanced first parameter value and the 
current second parameter value, respectively. 

2. A method of enhancing a coded audio signal comprising 
indices which represent audio signal parameters which comprise 
at least a first parameter for adjusting a first 
characteristic of the audio signal and a background noise 
parameter, the method comprising the steps of: 

determining a current first parameter value from an index 
corresponding to at least the first parameter; 

adjusting the current first parameter value in order to 
achieve an enhanced first characteristic, thereby obtaining an 
enhanced first parameter value; 

determining a new index value from a table relating index 
values to at least first parameter values, such that a new 
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first parameter value corresponding to the new index value 
substantially matches the enhanced first parameter value; 

detecting a current background noise parameter index 
value; and 

determining a new background noise parameter index value 
corresponding to the enhancement of the first characteristic. 

3. The method according to claim 2^ further comprising the 
steps of: 

determining a current second parameter value from the 
index further corresponding to a second parameter; and 

determining the new index value from the table further 
relating the index values to second parameter values, such 
that the new first parameter value corresponding to the new 
index value and a new second parameter value corresponding to 
the new index value substantially match the enhanced first 
parameter value and the current second parameter value, 
respectively. 

4. The method according to claim 1, further comprising the 
steps of: 

detecting a current background noise parameter index 
value; and 

determining a new background noise parameter index value 
corresponding to the enhancement of the first characteristic. 

5. The method according to claim 1, 3 or 4, wherein the new 
index value is determined from the table such that 
substantially matching the current second parameter value has 
precedence. 

6. The method according to any one of the preceding claims, 
further cointiprising the step of: 
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replacing a current value of the index corresponding to 
the at least first parameter by the determined new index 
value. 

7. An apparatus for enhancing a coded audio signal comprising 
indices which represent audio signal parameters which comprise 
at least a first parameter for adjusting a first 
characteristic of the audio signal and a second parameter, the 
apparatus comprising: 

parameter value determination means for determining a 
current first parameter value from an index corresponding to 
the first parameter and for determining a current second 
parameter value from the index further corresponding to the 
second parameter; 

adjusting means for adjusting the current first parameter 
value in or<ier to achieve an enhanced first characteristic, 
thereby obtaining an enhanced first parameter value; and 

index value determination means for determining a new 
index value from a table relating index values to first 
parameter values and further relating the index values to 
second parameter values, such that a new first parameter value 
corresponding to the new index value and a new second 
parameter value corresponding to the new index value 
substantially match the enhanced first parameter value and the 
current second parameter value, respectively. 

8. An apparatus for enhancing a coded audio signal comprising 
indices which represent audio signal parameters which comprise 
at least a first parameter for adjusting a first 
characteristic of the audio signal and a background noise 
parameter, the apparatus comprising: 

parameter value determination means for determining a 
current first parameter value from an index corresponding to 
at least the first parameter; 
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adjusting means for adjusting the current first parameter 
value in order to achieve an enhanced first characteristic, 
thereby obtaining an enhanced first parameter value; 

index value determination means for determining a new 
index value from a table relating index values to at least 
first parameter values, such that a new first parameter value 
corresponding to the new index value substantially matches the 
enhanced first parameter value; 

detecting means for detecting a current background noise 
parameter index value; and 

determining means for determining a new background noise 
parameter index value corresponding to the enhancement of the 
first characteristic. 

9. The apparatus according to claim 8, wherein 

said parameter value determination means is further 
arranged to determine a current second parameter value from 
the index further corresponding to a second parameter; and 

said index value determination means is further arranged 
to determine the new index value from the table further 
relating the index values to second parameter values, such 
that the new first parameter value corresponding to the new 
index value and a new second parameter value corresponding to 
the new index value substantially match the enhanced first 
parameter value and the current second parameter value, 
respectively, 

10. The apparatus according to claim 7, further comprising: 

detecting means for detecting a current background noise 
parameter index value; and 

determining means for determining a new background noise 
parameter index value corresponding to the enhancement of the 
first characteristic . 
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11- The apparatus according to claim 1, 9 or 10, wherein the 
index value determination means is arranged to determine the 
new index value from the table such that substantially 
matching the current second parameter value has precedence. 

12. The apparatus according to any one of claims 7 to 11^- 
further comprising: 

replacing means for replacing a current value of the index 
corresponding to the at least first parameter by the 
determined new index value. 

13. A method of enhancing a coded audio signal comprising 
indices which represent audio signal parameters, the method 
comprising the steps of: 

detecting a characteristic of the audio signals- 
detecting a current background noise parameter index 
value; and 

determining a new background noise parameter index value 
corresponding to the detected characteristic of the audio 
signal. 

14. An apparatus for enhancing a coded audio signal comprising 
indices which represent audio signal parameters, the apparatus 
comprising: 

detecting means for detecting a characteristic of the 
audio signal; 

detecting means for detecting a current background noise 
parameter index value; and 

determining means for determining a new background noise 
parameter index value corresponding to the detected 
characteristic of the audio signal. 

15. A method of enhancing a coded audio signal comprising 
indices which represent audio signal parameters which comprise 
at least a first parameter for adjusting a first 
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characteristic of the audio signal, a second parameter and a 
background noise parameter, the method comprising the steps 
of : 

determining a current first parameter value from an index 
corresponding to the first parameter; 

adjusting the current first parameter value in order to 
achieve an enhanced first characteristic, thereby obtaining an 
enhanced first parameter value; 

determining a current second parameter value from the 
index further corresponding to the second parameter; 

determining a new index value from a table relating index 
values to first parameter values and further relating the 
index values to second parameter values, such that a new first 
parameter value corresponding to the new index value and a new 
second parameter value corresponding to the new index value 
substantially match the enhanced first parameter value and the 
current second parameter value, respectively; 

detecting a current background noise parameter index 
value; and 

determining a new background noise parameter index value 
corresponding to the enhancement of the first characteristic. 

16. An apparatus for enhancing a coded audio signal comprising 
indices which represent audio signal parameters which comprise 
at least a first parameter for adjusting a first 
characteristic of the audio signal, a second parameter and a 
background noise parameter, the apparatus comprising: 

parameter value determination means for determining a 
current first parameter value from an index corresponding to 
the first parameter and for determining a current second 
parameter value from the index further corresponding to the 
second parameter; 

adjusting means for adjusting the current first parameter 
value in order to achieve an enhanced first characteristic, 
thereby obtaining an enhanced first parameter value; 



index value determination means for determining a new 
index value from a table relating index values to first 
parameter values and further relating the index values to 
second parameter values, such that a new first parameter value 
corresponding to the new index value and a new second 
parameter value corresponding to the new index value 
substantially match the enhanced first parameter value and the 
current second parameter value, respectively ; 

detecting means for detecting a current background noise 
parameter index value; and 

determining means for determining a new background noise 
parameter index value corresponding to the enhancement of the 
first characteristic. 

17. A computer program product, comprising software code 
portions for performing the steps of any one of claims 1 to 6, 
13 and 15 when the product is run on a computer. 

18. The computer program product according to claim 17, 
wherein said computer program product comprises a computer- 
readable medium on which said software code portions are 
stored. 



19. The computer program product according to claim 17, 
wherein said computer program product is directly loadable 
into the internal memory of the computer.. 
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ABSTEIACT : 

The invention relates to enhancing a coded audio signal 
comprising indices which represent audio signal parameters 
which comprise at least a first parameter for adjusting a 
first characteristic of speech. A current first parameter 
value is determined from an index corresponding to at least 
the first parameter, the current first parameter value is 
adjusted in order to achieve an enhanced first characteristic, 
thereby obtaining an enhanced first parameter value, and a new 
index value is determined from a table relating index values 
to at least first parameter values, such that a new first 
parameter value corresponding to the new index value 
substantially matches the enhanced first parameter value • 



(Fig. 14) 
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